perm filename WEBSPL.THS[UP,DOC] blob sn#726312 filedate 1983-10-25 generic text, type T, neo UTF8
Using Spell On WEB files.
	
	On of the important problems with using the spelling correcting program
SPELL of files designed for use with the WEB system is that there are parts of
these files that contain many nonwords.  It is possible to automatically 
recognize many of these sections.  Files for use with TeX suffer from similar
problems, but not as badly.  WEBSPL is a program that attempts to recognize
these areas containing nonwords and marks them so that SPELL will not check
the spelling of these areas.  Another program, SPLWEB, is necessary to unmark
these areas so as not to confuse TeX and/or WEB.  These programs were written
by Tom Spencer, spencer@score, and he will be maintaining them.  Address
complaints, praise, etc. to him.

	This is how to use the system.  Suppose that you wish to correct
the file FOO.WEB.  First you run WEBSPL and tell it that the WEBFILE
is FOO.WEB and that the SPELLFILE is FOO.SPL.  Then you run SPELL
saying that you want to correct FOO.SPL and the the corrected output
is to be put in the file FOO.SPC.  It is also necessary to specify that
SPELL is to be in PUB mode.  This requires typing `D' as one of the modes.
Finally, you run SPLWEB, telling it that the WEBFILE is FOO.SPC and that
the SPELLFILE is FOO.WEB.  It is also a good idea to delete FOO.SPL and
FOO.SPC as they are no longer necessary.

	Which parts of a WEB file are likely to contain nonwords?
Broadly speaking, there are two kinds of text in WEB files, TeX text
and PASCAL text.  The PASCAL text is sufficiently likely to contain nonwords
that it is worthless to try to correct the spelling of it.  The PASCAL text
in a WEB file consists of the definition part and the PASCAL part of a module
but not the module names and the comments.

	The TeX text is also likely to contain nonwords, but they can be 
isolated.  Specifically, control sequences are likely to be nonwords and math 
mode is likely to contain nonwords.  In addition, any text between vertical 
bars, '|', is PASCAL text and is not checked for spelling.

METHODS and LIMITATIONS

	The algorithms used by WEBSPL are not foolproof.  They and their 
limitations are described here for reference.  If WEBSPL gets confused
it will stay confused for a long time.  Certain error conditions are 
detected and used for synchronizing.  Error conditions cause a message 
to be printed in the ERRFILE and in the SPELLFILE.  The error messages
in the SPELLFILE begin with `.V ' and should be the the only occurrence
of these characters at the begining of a line.  The error messages are
deleted by SPLWEB.

	WEBSPL causes SPELL to ignore certain sections of text by using
SPELL's convention that in PUB mode any line begining with a dot is to be
ignored.  WEBSPL does not check for dots at the begining of a line.  
Therefore, any line begining with a dot will be ignored.  There are more 
serious consequences as well.  SPLWEB can not tell dots that were in the 
file to begin with from dots added by WEBSPL, except that all dots added by
WEBSPL are followed by one of the letters `O', `N', or `V' and always occur
at the begining of a line.  Therefore any dot that might have been inserted 
by WEBSPL will be deleted by SPLWEB.  The properties described here are 
considered to be tolerable misfeatures.  

	It is also possible to confuse WEBSPL about what is math mode and what 
in not.  WEBSPL's basic algorithm is consider that an unmatched $ in TeX text
starts math mode.  The math mode ends at the next $ that is in the same level
of curly braces ends the math mode.  There are some exceptions to this rule.
First, all control sequences are ignored.  A control sequence is defined as
a backslash, \, followed a nonletter or any number of letters up to the first
non letter.  Second, all text between an unignored % and the end of a line is 
ignored.  Third, all TeX characters are ignored between vertical bars, |.  
Thus, an open PASCAL mode sign is ignored if it occurs after a %, but a % in 
PASCAL mode is ignored.  There are many ways to make this algorithm fail.  
WEBSPL is likely to get confused in these cases.

	WEB control information is ignored in TeX comments, the remainder
of a line starting with an unignored %.  This is also a tolerable misfeature
since WEAVE and TANGLE do not understand TeX comments.

	WEBSPL and SPLWEB assume that the end-of-line mark is carriage return,
line feed.  Carriage returns not followed by line feeds may confuse either 
program.

	People using WEBSPL on TeX files should be aware that the characters
@ and | have significance in WEB and are likely to confuse WEBSPL if used 
carelessly.